Видео с ютуба Cuda Inference
Nvidia CUDA in 100 Seconds
Nvidia CUDA vs Apple Metal for AI Work
Запуск ИИ на FreeBSD (проблема CUDA)
Maximize LLM Inference Performance + Auto-Profile/Optimize PyTorch/CUDA Code
What is CUDA? - Computerphile
Оптимизация инференса LLM: асинхронный непрерывный батчинг с использованием CUDA Streams
Writing Code That Runs FAST on a GPU
Analyzing Deepseek's "undefined" NVIDIA PTX optimizations (with benchmarks!)
CUDA Programming Course – High-Performance Computing with GPUs
Освоение оптимизации вывода LLM: от теории до экономически эффективного внедрения: Марк Мойу
Understanding the LLM Inference Workload - Mark Moyou, NVIDIA
Demo - Chatbot Response Acceleration with CUDA LLM Inference
FASTER Inference with Torch TensorRT Deep Learning for Beginners - CPU vs CUDA
Dual RTX 5090s Destroy AI Benchmarks Ollama, CUDA Burn & 34B Model
Must Know Technique in GPU Computing | Episode 4: Tiled Matrix Multiplication in CUDA C
Stillwaters AI - LLM Systems Engineering | Inference, CUDA Memory, Tensor Cores, Observability, HPC